feat(quantized): GGUF-compat Q4_0 quant/dequant for burn QuantValue::Q4F/Q4S (sprint A5) by AdaWorldAPI · Pull Request #120 · AdaWorldAPI/ndarray

AdaWorldAPI · 2026-04-30T09:13:49Z

Summary

Sprint A5 of burn-ndarray parity sprint v1. Closes item (11) of the parity list — Q4 quant helpers needed for burn's QuantValue::Q4F / Q4S.

Existing `quantize_f32_to_i4` audit

Pre-existing impl at src/hpc/quantized.rs:355:

pub fn quantize_f32_to_i4(data: &[f32]) -> (Vec<u8>, QuantParams) — already public
Per-tensor symmetric: single scale = abs_max / 7.0, zero_point = 0
Packing: low nibble first (element 0 → low nibble of byte 0; element 1 → high nibble of byte 0; consecutive layout)
Range clamped to [-8, 7], sign-extended on dequant
Does NOT match GGUF Q4_0 (different block size, scale formula, and packing layout)

Decision

Option (a) additive — kept existing quantize_f32_to_i4 untouched (no breaking change to existing callers). Added new GGUF-compat functions alongside.

What's new (+211 LOC)

src/hpc/quantized.rs:466-676:

pub const Q4_0_BLOCK_SIZE: usize = 32;
pub const Q4_0_BYTES_PER_BLOCK: usize = 16;

/// Q4_0 packing — GGUF / llama.cpp compatible.
/// Per 32-element block: scale `d = max_signed / -8`, packed as 16 bytes
/// where byte `j` holds element `j` (low nibble) and `j+16` (high nibble).
pub fn quantize_f32_to_q4_0(data: &[f32]) -> (Vec<u8>, Vec<f32>);

/// Inverse — asserts on (packed.len(), scales.len()) consistency.
pub fn dequantize_q4_0_to_f32(packed: &[u8], scales: &[f32]) -> Vec<f32>;

The packing layout is the exact GGUF Q4_0 interleave (not the linear layout quantize_f32_to_i4 uses), matching what llama.cpp produces.

Tests (6 new, 17/17 pass)

test_i4_boundary_values — exact boundaries at ±7 (scale=1.0) and clamp at ±8 (scale=8/7)
test_q4_0_roundtrip_single_block — 32 floats round-trip
test_q4_0_roundtrip_multi_block — 3-block (96 floats)
test_q4_0_zero_block — d=0 edge case
test_q4_0_packing_layout_interleaved — asserts byte j holds elements j and j+16
test_q4_0_requires_block_aligned — #[should_panic] for non-32-multiple input

Acceptance

cargo build: clean (existing 39 warnings, none new)
cargo test --lib quantized: 17 passed, 0 failed
cargo fmt --check: only pre-existing diffs in code A5 didn't touch; new code is fmt-clean

Plan reference

.claude/plans/burn-ndarray-parity-sprint-v1.md — Item (11)
Sister PRs: A1 feat(deps): gate blake3/p64/fractal behind hpc-extras feature (sprint A1) #116, A4 feat(amx): public ndarray-typed matmul API for f32/bf16/i8 with strided handling (sprint A4) #119, A7 fix(simd): aarch64 F32x16/F64x8 use real NEON paired loads, not scalar (sprint A7) #117, A12 feat(simd): no_std polyfill for tier() cache via portable-atomic + critical-section (sprint A12) #118

Notes

GPG-signed commit (A5 worked around the env's codesign-helper quirk by mirror-committing in /home/user/ndarray and fetching the SHA into its worktree). Same key as recent master commits.

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

Generated by Claude Code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7609ccd67f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-30T09:18:00Z

+            let lo = ((block[j] * id).round() + 8.5).floor().clamp(0.0, 15.0) as u8;
+            let hi = ((block[j + Q4_0_BYTES_PER_BLOCK] * id).round() + 8.5)
+                .floor()
+                .clamp(0.0, 15.0) as u8;


Use GGUF Q4_0 quantizer rounding rule

quantize_f32_to_q4_0 currently computes each nibble with ((x * id).round() + 8.5).floor(), but GGUF/llama.cpp Q4_0 uses truncation of x * id + 8.5 (effectively floor(x * id + 8.5) for this nonnegative range). These are not equivalent for negative half-step inputs (e.g. x*id = -0.5 gives 7 here vs 8 in GGUF), so this can produce different packed bytes from the same weights and break the advertised byte-level compatibility with existing Q4_0 tensors.

Useful? React with 👍 / 👎.

Add quantize_f32_to_q4_0 / dequantize_q4_0_to_f32 implementing the GGUF / llama.cpp per-32-element block scheme: 16 packed bytes plus one f32 scale d = max_signed/-8 per block, with the canonical interleaved nibble layout (element j -> low nibble of byte j; element j+16 -> high nibble of byte j). The existing per-tensor quantize_f32_to_i4 (low-nibble-first, non-interleaved, scale = abs_max/7) is preserved unchanged for backwards compatibility. Burn QuantValue::Q4F / Q4S callers can opt into either scheme. Tests: i4 boundary +/-7 and clamp +/-8; Q4_0 single-block, multi-block, zero-block, interleaved layout, non-aligned panic. https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

AdaWorldAPI mentioned this pull request Apr 30, 2026

feat(backend/mkl): public sgemm/dgemm/bf16/int8 wrappers with ndarray-typed sigs (sprint A6) #121

Merged

chatgpt-codex-connector Bot reviewed Apr 30, 2026

View reviewed changes

AdaWorldAPI mentioned this pull request Apr 30, 2026

feat(hpc/reductions): SIMD-dispatched sum/mean/max/min/argmax/nrm2 (~5x on 1M f32) (sprint A10) #122

Merged

AdaWorldAPI force-pushed the claude/burn-A5-q4-quant branch from 7609ccd to 376aacb Compare April 30, 2026 09:51

AdaWorldAPI merged commit 035dc41 into master Apr 30, 2026
5 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(quantized): GGUF-compat Q4_0 quant/dequant for burn QuantValue::Q4F/Q4S (sprint A5)#120

feat(quantized): GGUF-compat Q4_0 quant/dequant for burn QuantValue::Q4F/Q4S (sprint A5)#120
AdaWorldAPI merged 1 commit into
masterfrom
claude/burn-A5-q4-quant

AdaWorldAPI commented Apr 30, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented Apr 30, 2026

Summary

Existing quantize_f32_to_i4 audit

Decision

What's new (+211 LOC)

Tests (6 new, 17/17 pass)

Acceptance

Plan reference

Notes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Existing `quantize_f32_to_i4` audit